feat: implement DuckDB filesystem integration for Vortex file handling#6198
feat: implement DuckDB filesystem integration for Vortex file handling#6198iceboundrock wants to merge 2 commits intovortex-data:developfrom
Conversation
Merging this PR will not alter performance
Comparing Footnotes
|
0ax1
left a comment
There was a problem hiding this comment.
Thanks for taking the time to look into this. There's a couple of issues around thread safety / locking, error handling and memory leaks.
If you used AI to assist in writing the code for your PR please mention this in your PR description (see: https://github.com/vortex-data/vortex/blob/develop/CONTRIBUTING.md).
0ax1
left a comment
There was a problem hiding this comment.
Almost there, thanks for taking the time to address the comments!
66fedc0 to
b6e15a3
Compare
Hi @0ax1 , any other comments? |
Currently swamped with other work but will get back to it when I find a free moment. |
|
@iceboundrock - quick question on this. Do you know if it's possible to grab the config back out from DuckDB's filesystems? I don't want to block this PR, don't worry(!), but we do have plans to implement a high performance I/O layer at some point and it would probably make more sense to try and configure our own I/O using duckdb configs, rather than attempt to optimize calling back into DuckDB's I/O layer. |
Could you please elaborate on what We probably can do the same as the httpfs extension does. Code link: |
Signed-off-by: Ruoshi <me@ruoshi.li>
…chemas. Signed-off-by: Ruoshi <me@ruoshi.li>
20ea783 to
b2db9de
Compare
Let's handle this in a follow up to limit the scope of this PR. |
0ax1
left a comment
There was a problem hiding this comment.
PR looks great and CI (including all benchmarks) pass.
Last bit of fine-tuning and this is ready to land.
|
|
||
| try { | ||
| auto *wrapper = reinterpret_cast<FileHandleWrapper *>(handle); | ||
| wrapper->handle->Write(QueryContext(), const_cast<uint8_t *>(buffer), len, offset); |
There was a problem hiding this comment.
Drop the const_cast and adjust the signature accordingly.
| } | ||
| } | ||
|
|
||
| // SAFETY: ClientContext is an opaque pointer owned by DuckDB and remains valid for the lifetime of |
There was a problem hiding this comment.
These are actually not needed, as we pass around SendableClientContext.
Drop:
unsafe impl Send for ClientContext {}
unsafe impl Sync for ClientContext {}
| let mut out_len: cpp::idx_t = 0; | ||
| let offset = self.pos; | ||
|
|
||
| let status = unsafe { |
There was a problem hiding this comment.
Use spawn blocking, instead of a blocking write.
| vortex_err!("{message}") | ||
| } | ||
|
|
||
| pub fn duckdb_fs_glob(ctx: &ClientContext, pattern: &str) -> VortexResult<Vec<url::Url>> { |
| } | ||
|
|
||
| /// A VortexReadAt implementation backed by DuckDB's filesystem (e.g., httpfs/s3). | ||
| pub struct DuckDbFsReader { |
| } | ||
|
|
||
| impl DuckDbFsReader { | ||
| pub unsafe fn open_url( |
| unsafe impl Send for DuckDbFsReader {} | ||
| unsafe impl Sync for DuckDbFsReader {} | ||
|
|
||
| pub struct DuckDbFsWriter { |
| } | ||
|
|
||
| impl DuckDbFsWriter { | ||
| pub unsafe fn create(ctx: cpp::duckdb_vx_client_context, path: &str) -> VortexResult<Self> { |
| Ok(urls) | ||
| } | ||
|
|
||
| pub unsafe fn duckdb_fs_create_writer( |
Uh oh!
There was an error while loading. Please reload this page.